44 research outputs found

    Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

    Full text link
    In this paper we introduce evidence transfer for clustering, a deep learning method that can incrementally manipulate the latent representations of an autoencoder, according to external categorical evidence, in order to improve a clustering outcome. By evidence transfer we define the process by which the categorical outcome of an external, auxiliary task is exploited to improve a primary task, in this case representation learning for clustering. Our proposed method makes no assumptions regarding the categorical evidence presented, nor the structure of the latent space. We compare our method, against the baseline solution by performing k-means clustering before and after its deployment. Experiments with three different kinds of evidence show that our method effectively manipulates the latent representations when introduced with real corresponding evidence, while remaining robust when presented with low quality evidence

    DARE Platform a Developer-Friendly and Self-Optimising Workflows-as-a-Service Framework for e-Science on the Cloud

    Get PDF
    The DARE platform, developed as part of the H2020 DARE project (grant agreement No 777413), enables the seamless development and reusability of scientific workflows and applications, and the reproducibility of the experiments. Further, it provides Workflow-as-a-Service (WaaS) functionality and dynamic loading of execution contexts in order to hide technical complexity from its end users. This archive includes v3.5 of the DARE platform

    dispel4py: An Open-Source Python library for Data-Intensive Seismology

    Get PDF
    Scientific workflows are a necessary tool for many scientific communities as they enable easy composition and execution of applications on computing resources while scientists can focus on their research without being distracted by the computation management. Nowadays, scientific communities (e.g. Seismology) have access to a large variety of computing resources and their computational problems are best addressed using parallel computing technology. However, successful use of these technologies requires a lot of additional machinery whose use is not straightforward for non-experts: different parallel frameworks (MPI, Storm, multiprocessing, etc.) must be used depending on the computing resources (local machines, grids, clouds, clusters) where applications are run. This implies that for achieving the best applications' performance, users usually have to change their codes depending on the features of the platform selected for running them. This work presents dispel4py, a new open-source Python library for describing abstract stream-based workflows for distributed data-intensive applications. Special care has been taken to provide dispel4py with the ability to map abstract workflows to different platforms dynamically at run-time. Currently dispel4py has four mappings: Apache Storm, MPI, multi-threading and sequential. The main goal of dispel4py is to provide an easy-to-use tool to develop and test workflows in local resources by using the sequential mode with a small dataset. Later, once a workflow is ready for long runs, it can be automatically executed on different parallel resources. dispel4py takes care of the underlying mappings by performing an efficient parallelisation. Processing Elements (PE) represent the basic computational activities of any dispel4Py workflow, which can be a seismologic algorithm, or a data transformation process. For creating a dispel4py workflow, users only have to write very few lines of code to describe their PEs and how they are connected by using Python, which is widely supported on many platforms and is popular in many scientific domains, such as in geosciences. Once, a dispel4py workflow is written, a user only has to select which mapping they would like to use, and everything else (parallelisation, distribution of data) is carried on by dispel4py without any cost to the user. Among all dispel4py features we would like to highlight the following: * The PEs are connected by streams and not by writing to and reading from intermediate files, avoiding many IO operations. * The PEs can be stored into a registry. Therefore, different users can recombine PEs in many different workflows. * dispel4py has been enriched with a provenance mechanism to support runtime provenance analysis. We have adopted the W3C-PROV data model, which is accessible via a prototypal browser-based user interface and a web API. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data, which may be selectively stored at runtime, into dedicated data archives. dispel4py has been already used by seismologists in the VERCE project to develop different seismic workflows. One of them is the Seismic Ambient Noise Cross-Correlation workflow, which preprocesses and cross-correlates traces from several stations. First, this workflow was tested on a local machine by using a small number of stations as input data. Later, it was executed on different parallel platforms (SuperMUC cluster, and Terracorrelator machine), automatically scaling up by using MPI and multiprocessing mappings and up to 1000 stations as input data. The results show that the dispel4py achieves scalable performance in both mappings tested on different parallel platforms

    Comprehensible Control for Researchers and Developers facing Data Challenges

    Get PDF
    The DARE platform enables researchers and their developers to exploit more capabilities to handle complexity and scale in data, computation and collaboration. Today’s challenges pose increasing and urgent demands for this combination of capabilities. To meet technical, economic and governance constraints, application communities must use use shared digital infrastructure principally via virtualisation and mapping. This requires precise abstractions that retain their meaning while their implementations and infrastructures change. Giving specialists direct control over these capabilities with detail relevant to each discipline is necessary for adoption. Research agility, improved power and retained return on intellectual investment incentivise that adoption. We report on an architecture for establishing and sustaining the necessary optimised mappings and early evaluations of its feasibility with two application communities.PublishedSan Diego (CA, USA)3IT. Calcolo scientific

    Please call ME.N.U.4EVER: designing for “Callback” in rural Africa

    Get PDF
    Proceedings of the Tenth International Workshop on Internationalisation of Products and Systems. Kutching, Malaysia, 11-14 July, 2011Designers and developers are naïve about the ways impoverished people in rural Africa innovate new uses of mobile technology to circumvent access difficulties. Here, we report on the local appropriation of an USSD ‘Callback’ service in a rural community in South Africa’s Eastern Cape which enables people to send free text messages and includes strategies that respond to severe constraints on message length and local communication protocols. This report shows that a participative approach, in which community members co-generate methods and interpret data, elicits major and formerly unreported findings. We describe the results of two sets of interviews about the use of cell-phones and Callback locally and the implications of this use for designing and realizing a media-sharing system. Our findings indicate that the community needs a system to charge phones and share media without consuming airtime and functionality for the 70-80% of people who do not own high-end phones. Use of Callback suggests people will manage a system to create, store and share content at a local ‘station’ but notify others about content using separate networks. Callback-use reveals local priorities that shape: the meaning of usability and utility; the ways people manage sequences of communication; and, the ‘rules’ that enable people to use Callback for multiple purposes and make sense of Callbacks despite ambiguity. These priorities inform introducing prototypes and contribute to exploring the communication patterns that might, subsequently, emerge.EPSRCDepartment of HE and Training approved lis

    DARE: A Reflective Platform Designed to Enable Agile Data-Driven Research on the Cloud

    Get PDF
    The DARE platform has been designed to help research developers deliver user-facing applications and solutions over diverse underlying e-infrastructures, data and computational contexts. The platform is Cloud-ready, and relies on the exposure of APIs, which are suitable for raising the abstraction level and hiding complexity. At its core, the platform implements the cataloguing and execution of fine-grained and Python-based dispel4py workflows as services. Reflection is achieved via a logical knowledge base, comprising multiple internal catalogues, registries and semantics, while it supports persistent and pervasive data provenance. This paper presents design and implementation aspects of the DARE platform, as well as it provides directions for future development.PublishedSan Diego (CA, USA)3IT. Calcolo scientific

    dispel4py: An Open Source Python Framework for Encoding, Mapping and Reusing Seismic Continuous Data Streams: Intensive Analysis and Data Mining

    Get PDF
    Scientific workflows are needed by many scientific communities, such as seismology, as they enable easy composition and execution of applications, enabling scientists to focus on their research without being distracted by arranging computation and data management. However, there are challenges to be addressed. In many systems users have to adapt their codes and data movement as they change from one HPC-architecture to another. They still need to be aware of the computing architectures available for achieving the best application performance. We present dispel4py, an open-source framework presented as a Python library for encoding and automating data-intensive scientific methods as a graph of operations coupled together by data-streams. It enables scientists to develop and experiment with their own data-intensive applications using their familiar work environment. These are then automatically mapped to a variety of HPC-architectures, i.e., MPI, multiprocessing, Storm and Spark frameworks, increasing the chances to reuse their applications in different computing resources. dispel4py comes with data provenance, as shown in the screenshot, and with an information registry that can be accessed transparently from within workflows. dispel4py has been enhanced with a new run-time adaptive compression strategy to reduce the data stream volume and a diagnostic tool which monitors workflow performance and computes the most efficient parallelisation to use. dispel4py has been used by seismologists in the project VERCE for seismic ambient noise cross-correlation applications and for orchestrated HPC wave simulation and data misfit analysis workflows; two data-intensive problems that are common in today's research practice. Both have been tested in several local computing resources and later submitted to a variety of European PRACE HPC-architectures (e.g. SuperMUC & CINECA) for longer runs without change. Results show that dispel4py is an easy tool for developing, sharing and reusing data-intensive scientific methods
    corecore